使用深度学习技术,可以在MRI图像中自动检测到旁那鼻鼻窦系统中的异常,并可以根据其体积,形状和其他参数(例如局部对比度)进行进一步分析和分类。但是,由于培训数据有限,传统的监督学习方法通​​常无法概括。现有的旁那间异常分类中的深度学习方法最多可诊断出一种异常。在我们的工作中,我们考虑三个异常。具体而言,我们采用3D CNN来分离上颌鼻窦体积,而没有异常的鼻窦体积,并具有异常。为了从一个小标记的数据集中学习强大的表示形式,我们提出了一种新颖的学习范式,结合了对比损失和跨内向损失。特别是,我们使用有监督的对比损失,鼓励有或没有异常的上颌窦量的嵌入来形成两个不同的簇,而跨层损失则鼓励3D CNN保持其歧视能力。我们报告说,两种损失的优化是有利的,而不是仅通过一次损失而优化。我们还发现我们的培训策略会提高标签效率。使用我们的方法,3D CNN分类器的AUROC为0.85,而用横向渗透损失优化的3D CNN分类器可实现0.66的AUROC。
translated by 谷歌翻译
X光片是用于检测和评估病理,治疗计划或用于导航和本地化目的的多功能诊断工具。但是,放射科医生的解释和评估可能乏味且容易出错。因此,已经提出了多种深度学习方法来支持放射线射线照片的放射科医生。通常,这些方法依靠卷积神经网络(CNN)从图像中提取特征。特别是对于胸部X光片(胸部X射线,CXR)的病理多标签分类,CNN已被证明非常适合。相反,尽管在通用图像和可解释的局部显着性图上的分类性能很高,但视觉变压器(VIT)尚未应用于此任务,这可能会增加临床干预措施的价值。 VIT并不依赖于卷积,而是基于基础的自我注意力,与CNN相反,不存在局部连通性的先验知识。尽管这导致容量增加,但VIT通常需要过多的培训数据,这代表了医疗领域的障碍,因为高成本与收集大型医疗数据集有关。在这项工作中,我们系统地比较了不同数据集大小的VIT和CNN的分类性能,并评估了更多数据有效的VIT变体(DEIT)。我们的结果表明,虽然VIT和CNN之间的性能与VIT相当,但如果可以训练相当大的数据集,但DEIT的表现优于前者。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.
translated by 谷歌翻译
In this paper, we introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead probabilistically sample all the holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maintaining both the photo-realism of audio-lip synchronization and the overall naturalness. This is achieved by our newly proposed audio-to-visual diffusion prior trained on top of the mapping between audio and disentangled non-lip facial representations. Thanks to the probabilistic nature of the diffusion prior, one big advantage of our framework is it can synthesize diverse facial motion sequences given the same audio clip, which is quite user-friendly for many real applications. Through comprehensive evaluations on public benchmarks, we conclude that (1) our diffusion prior outperforms auto-regressive prior significantly on almost all the concerned metrics; (2) our overall system is competitive with prior works in terms of audio-lip synchronization but can effectively sample rich and natural-looking lip-irrelevant facial motions while still semantically harmonized with the audio input.
translated by 谷歌翻译
To date, the comparison of Statistical Shape Models (SSMs) is often solely performance-based and carried out by means of simplistic metrics such as compactness, generalization, or specificity. Any similarities or differences between the actual shape spaces can neither be visualized nor quantified. In this paper, we present a first method to compare two SSMs in dense correspondence by computing approximate intersection spaces and set-theoretic differences between the affine vector spaces spanned by the models. To this end, we approximate the distribution of shapes lying in the intersection space using Markov Chain Monte Carlo, and then apply Principal Component Analysis (PCA) to its samples. By representing the resulting spaces again as an SSM, our method enables an easy and intuitive analysis of similarities between two model's shape spaces. We estimate differences between SSMs in a similar manner; here, however, the resulting shape spaces are not linear vector spaces anymore and we do not apply PCA but instead use the posterior samples for visualization. We showcase the proposed algorithm qualitatively by computing and analyzing intersection spaces and differences between publicly available face models focusing on gender-specific male and female as well as identity and expression models. Our quantitative evaluation based on SSMs built from synthetic and real-world data sets provides detailed evidence that the introduced method is able to recover ground-truth intersection spaces and differences. Finally, we demonstrate that the proposed algorithm can be easily adapted to also compute intersections and differences between color spaces.
translated by 谷歌翻译
A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and beneficial machine learning systems. To mitigate this risk, we propose the task blocking paradigm, in which foundation models are trained with an additional mechanism to impede adaptation to harmful tasks while retaining good performance on desired tasks. We call the resulting models self-destructing models, inspired by mechanisms that prevent adversaries from using tools for harmful purposes. We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning, showing that it can largely prevent a BERT-based model from learning to perform gender identification without harming the model's ability to perform profession classification. We conclude with a discussion of future directions.
translated by 谷歌翻译
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.
translated by 谷歌翻译
希望以优先的,有序的方式相结合,因为它允许模块化设计并通过知识传输来促进数据重用。在控制理论中,优先的组合物是通过空空间控制实现的,其中低优先级控制动作被投影到高优先级控制动作的空空间中。这种方法目前无法用于加强学习。我们为增强学习提出了一个新颖的,任务优先的组成框架,其中涉及一个新颖的概念:强化学习政策的冷漠空间。我们的框架有可能促进知识转移和模块化设计,同时大大提高数据效率和增强学习代理的数据重用。此外,我们的方法可以确保高优先级的限制满意度,这使得在机器人技术等安全 - 关键领域中学习有望。与零空间的控制不同,我们的方法允许通过在最初的复合策略构建后在高级政策的无差异空间中在线学习来学习复合任务的全球最佳策略。
translated by 谷歌翻译